Methods

Spatial data preparation

Load data

Maps data

We use three different shapefiles for the continental U.S. land mass, the State waters of maine, new hampshire, massachusets, connecticut, rhode island, new york, new jersey, delaware, maryland, virginia, north carolina and the North East EEZ.

1.- Land shapefile; covers the US land territory for visualization. Data provided from the map package.

2.- State waters; covers the state waters of the NE US states. Data from data.gov.

License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person’s official duties it is considered a U.S. Government Work.

3.- EEZ shapefile; Used the Sea Around Us shapefle updated to June 2016.

unique(land_sf$ID)

Create base shapefile for computation

As a first step we need to divide the NE US EEZ among the different states. For that we expanded state waters up to the 200 nautical miles to then estimate the percentage that each expanded-state-waters occupied. Note that in all cases these areas overlapped and percentages accounted for it. We did this by following these steps:

    1. Expand state waters using a buffer
    1. Grid that buffer on a 0.3 resolution
    1. Crop the buffer to the EEZ

1. Expand Spatial regions (a.k.a buffers)

We set a buffer of 410000 m (410 km, ~ 221 nm) that overshoots the EEZ a bit, but is eventually cropped

2. Expand grid within buffer

Here we expand a grid within the buffer so we can estimate the proportion of each state.

Note: Dark grey shaded area is the grid

2.1 Merge grid and buffers

Once we have a gridded area, we converted the grid to a sf so we can merge it with the buffered states and finally filter out everything outside the states polygon

3. Crop buffers to EEZ

Finally, we crop the grided buffers to within the EEZ to capture the actual water space.

Note: This step takes quite a while because of the size of the EEZ shapefile. No, you cannot use st_simplify() here



# I don't know how to undo st_simplify so need to re-load the shapefile
eez_sf <- st_read(dsn = path_world,
                        layer =file_path_sans_ext(fnam_world)) %>% 
  rename(eez_name = Name) %>% 
  st_transform(crs = 4326) %>% # 4326
  filter(eez_name == "USA (East Coast)")

# Get the overlapping segments
grid_eez_sf <- st_intersection(grid_sf,eez_sf)

# write_sf(grid_eez_sf, paste0(my_path("D", "Spatial"),"grid_eez_sf.shp"))

# Get final df with index
grid_eez_df <- as.data.frame(grid_eez_sf) %>%
  select(state,index) %>% 
  left_join(ne_grid, 
            by = "index")

# write_csv(grid_eez_df, paste0(my_path("R", "Partial"),"grid_eez_df.csv"))

# Plot to make sure makes sense (Picasso syle)
grid_eez_sf %>%
  st_simplify(preserveTopology = TRUE, dTolerance = 10000) %>% #0.1 for paper
  ggplot() +
  geom_sf(aes(color = state), alpha = 0.3)

Interpolation rutine

In this step we interpolate the survey data within the previously created grid following a Triangular Irregular Surface method.

  • We removed cases where wtcpue < 0

Functions needed

We need to create a couple of functions to run the whole process

Interpolation main fx (tis).

This is the main function used to interpolate the data per year. It follows a Triangular Irregular Surface method using the interp::interp() function. If you want to see the function clic on code


tis <- function(input_data, grid, yr, taxa, reg){
  
  # --------------- #
  # Testing
  # print(paste(yr))
  # yr = 1976
  # --------------- #
  
  # Filter data
  data <- input_data %>% 
    filter(year == yr,
           spp == taxa,
           region == reg
    ) %>% 
    # Average duplicated hauls in the same spot
    group_by(region,year,spp,lat,lon) %>%
    summarise_at(vars(wtcpue),
                 mean,na.rm = T) %>%
    filter(wtcpue > 0)
    
  # Only interpolate cases where there is more than 3 rows
  # Marked by the function 
    if(nrow(data) <= 3){
      fit_tin <- tibble()
    }else{
      
      # Triangular Irregular Surface
      fit_tin <- interp::interp( # using {interp}
        x = data$lon,           # the function actually accepts coordinate vectors
        y = data$lat,
        z = data$wtcpue,
        xo = grid$lon,     # here we already define the target grid
        yo = grid$lat,
        output = "points"
      ) %>% 
        bind_cols() %>% 
        bind_cols(grid) %>%
        mutate(year = yr,
               region = reg,
               spp = taxa) %>% 
        select(index, state, year, region, spp, lon=x, lat=y, value = z)
      
    }
   
    return(fit_tin)
}

# Test me
# Needs variables in Control panel
# Test no data: "Alosa aestivalis", reg = "Northeast US Fall", yr = 1974 
# tis(input_data = ocean_data, grid = grid_eez_df, taxa = "Illex illecebrosus", reg = "Northeast US Fall", yr = 1973)



# lapply(years,tis,input_data = ocean_data, grid = grid_eez_df, taxa = "Illex illecebrosus", reg = regions[2])
Run function

This is a sub-function that runs the tis() function by taxa and region. It saves the output as a .csv file for each species.



run_tis <- function(input_data, grid, years, taxa, reg){
  
  
  # Run tis for species and surveys
  for(r in 1:2){
    
    partial_df <- bind_rows(
      lapply(years,tis,input_data = ocean_data, grid = grid_eez_df, taxa = taxa, reg = regions[r])
    )
    
    if(r == 1){
      historic_tif <- partial_df
    }else{
      historic_tif <- bind_rows(historic_tif,partial_df)
    }
    
  }
  
  # ----------------------- #
  # Save dataset per species
  # ----------------------- #
  
  # Set file name
  name <- paste0("tif_",str_replace(taxa," ","_"),".csv")
  
  # Set path name
  save_path <- my_path("R","Partial/Interpolation")
  
  # Set complete path
  save_name <- paste0(save_path,name)
  
  # Create folder if it does not exist
  if(file.exists(save_path) == F){
    dir.create(save_path)
  }
  
  #  Save file
  write_csv(historic_tif,save_name)
  
  return_msg <- paste("Interpolation done for", taxa)
  
  return(return_msg)
  
  
}

# Test me
    # run_tis(input_data = ocean_data, grid = grid_eez_df, taxa = "Centropristis striata", years = years, reg = regions)

Survey data

The interpolation was done with NOAA Northeast Fisheries Science Center Spring and Fall Bottom Trawl Surveys data provided by Ocean adapt. Data was accessed trough the Github.

In primary publications using data from the database, please cite Pinsky et al. 2013. Marine taxa track local climate velocities. Science 341: 1239-1242 doi: 10.1126/science.1239352, as well as the original data sources.

  • Using only the Northeast US Fall and Spring bottom trawl survey data for now
Splitting up data
  • No part on spatial analysis. Can be ignored.

This is just a sub-step to split up the data into single species files. This makes the app faster as it only needs to load species specific data, rather than all the data at de begining.

Control Pannel

This is where we load the required data and prepare to run the interpolation function. Note that some of the data here has been previously created


# Load grid df
grid_eez_df <- my_path("D","Spatial","grid_eez_df.csv", read = T)

# Run interpolation for all years
years <- seq(1973,2019,1)

# regions
regions <- c("Northeast US Fall" , "Northeast US Spring")

# species list
spp <- ocean_data %>% 
  filter(region %in% regions,
         spp != "NA") %>% 
  pull(spp) %>% 
  unique()


# Double check runs

spp_runs <- tibble(taxa = (list.files(my_path("R","Partial/Interpolation")))) %>%
  mutate(
    taxa = str_remove(taxa, "tif_"),
    taxa = str_remove(taxa, ".csv"),
    taxa = str_replace(taxa, "_", " ")
  ) 

# Missing runs
spp <- tibble(taxa=spp) %>% 
  anti_join(spp_runs) %>% 
  pull(taxa)

Run routine

Here we just run the routine for each of the species present in the Northeast US Fall and Spring surveys between 1973 and 2019.

  • Note there are 43 identified taxa
  • Some taxa do not have presence data in some years

# single species run
# run_tis(input_data = ocean_data,
#         grid = grid_eez_df,
#         years = years,
#         reg = regions,
#         taxa = "Illex illecebrosus"
# )


# Run them all in parallel
parallel::mclapply(spp,
                   run_tis, 
                   input_data = ocean_data,
                   grid = grid_eez_df,
                   years = years,
                   reg = regions
)

Results

This section shows preliminary results for Black sea bass (Centropristis striata).


# Load interpolated data
# "Centropristis striata"
# 
historic_tif <- my_path("R","Partial/Interpolation","tif_Centropristis_striata.csv", read = T)

# Spatial data

States <- c("maine", "new hampshire", "massachusetts", "connecticut", "rhode island", "new york", "new jersey", "delaware", "maryland", "virginia", "north carolina","pennsylvania") 

# US State Map (land)

land_sf <- st_as_sf(map("state", plot = FALSE, fill = TRUE)) %>% 
  filter(ID %in% States) 


# Periods
periods <-tibble(
  period = c(rep("a early",12),
             rep("b mid",12),
             rep("c late",12),
             rep("d now",12)
             ),
  year = c(seq(1972,1984,1),
            seq(1985,1997,1),
            seq(1998,2011,1),
            seq(2012,2019,1)
            )
  )

# State pallet

state_pallet <- c(wes_palette(n = 5, name = "Darjeeling1"),
                               wes_palette(n = 5, name = "Darjeeling2"),
                               wes_palette(n = 3, name = "Royal1")
                  )



# print for notebook
head(historic_tif)

Average proportion

This map shows the average proportion of the extrapolated value from the whole study period within each State’s water.

Note: Grey tiles represent NAs

Proportion Change

Here we show the average proportion of the interpolation by State and time period. Time periods were arbitrary defined as;

  • Early; 1972 to 1984
  • Mid; 1985 to 1997
  • Late; 1998 to 2011
  • Now; 2012 to 2019

Notes: To panel represents Fall survey and bottom panel Spring. This computation considers the Overlapping of state waters.


total_fited <- historic_tif %>% 
  group_by(year,region) %>% 
  summarise(total_value = sum(value,na.rm=T),.groups = "drop")

state_fit <- historic_tif %>% 
  group_by(state,year,region) %>% 
  summarise(state_value = sum(value,na.rm= T), .groups = "drop") %>% 
  left_join(total_fited,
            by = c("year","region")) %>%
  mutate(percentage = state_value/total_value*100) %>% 
  left_join(periods) %>% 
  group_by(ID = state,period,region) %>% 
  summarise(mean_per = round(mean(percentage)),.groups = "drop")
Joining, by = "year"
  
# check percentages
# state_fit %>% 
#   group_by(period) %>% 
#   summarise(sum(mean_per))


land_sf %>% 
  left_join(state_fit,
            by = "ID") %>% 
  ggplot() +
  geom_sf(aes(fill = mean_per)) +
  viridis::scale_fill_viridis("Mean Proportion", alpha = 0.8) +
  facet_wrap(~region + period, nrow = 2) +
  my_ggtheme_m()

Area plot

This graph shows the proportion of the interpolation value each State has over time.

Note: This plot is interactive. For ease comparison between States you can;

  • One click on any State to remove it from the plot
  • Two clicks on any State to isolate it from the plot (other states can then be added by just clicking on them).
  • The bottom panel allows you to reduce the time frame

total_fited <- historic_tif %>% 
  group_by(year,region) %>% 
  summarise(total_value = sum(value,na.rm=T),.groups = "drop")

# group by state
state_fit <- historic_tif %>% 
  group_by(state,year,region) %>% 
  summarise(state_value = sum(value,na.rm= T), .groups = "drop") %>% 
  left_join(total_fited,
            by = c("year","region")) %>%
  mutate(percentage = state_value/total_value*100)

p <- ggplot(state_fit) +
  geom_area(
    aes(
      x = year,
      y = round(percentage),
      fill = state
    )
  ) +
  ylab("Percentage (%)") +
  # viridis::scale_fill_viridis(discrete = T, alpha = 0.8) +
  scale_fill_manual(values = state_pallet) +
  MyFunctions::my_ggtheme_p() +
  facet_wrap(~region, ncol = 1)

ggplotly(p,
         dynamicTicks = TRUE) %>% 
  layout(hovermode = "x") %>%
  rangeslider()
NA

Area plot (running mean)

This graph shows the proportion of each State smoothed over a *10 years running mean**. It allows to better see increasing and decreasing trends.

Note: This plot is also interactive and thus, follows the same options of the previous one.


# group by state
state_fit <- historic_tif %>% 
  group_by(state,year,region,.groups = "drop") %>% 
  summarise(state_value = sum(value,na.rm= T), .groups = "drop") %>% 
  left_join(total_fited,
            by = c("year","region")) %>%
  mutate(percentage = state_value/total_value*100) %>% 
  group_by(state,region) %>% 
  mutate(RMean = zoo::rollmean(x = percentage, 
                            10,
                            fill = NA,
                            na.rm=T)
    )

# Plot me
p <- ggplot(state_fit) +
  geom_area(
    aes(
      x = year,
      y = round(RMean),
      fill = state
    )
  ) +
  ylab("10 yrs running average (%)") +
  # viridis::scale_fill_viridis(discrete = T, alpha = 0.8) +
  scale_fill_manual(values = state_pallet) +
  MyFunctions::my_ggtheme_p() +
  facet_wrap(~region, ncol = 1)

suppressWarnings(
ggplotly(p,
         dynamicTicks = TRUE) %>% 
  layout(hovermode = "x") %>% 
  # add_trace() %>% 
  rangeslider()
)
NA
---
title: "Spatial Analysis"
author: "Juliano Palacios Abrantes"
output:
  html_notebook:
    fig_width: 6
    fig_height: 4
    code_folding: hide
    toc: yes
    toc_depth: 3
    toc_float:
      collapsed: false
    theme: darkly
---

```{r setup, include=FALSE}
library(MyFunctions)
MyFunctions::my_lib(c("ggmap","sf","tidyverse","tools","readr","data.table","maps","viridis","wesanderson","knitr","kableExtra","plotly"))
```

# Methods

## Spatial data preparation

### Load data

#### Maps data

We use three different shapefiles for the continental U.S. land mass, the State waters of maine, new hampshire, massachusets, connecticut, rhode island, new york, new jersey, delaware, maryland, virginia, north carolina and the North East EEZ.

1.- **Land shapefile;** covers the US land territory for visualization. Data provided from the `map` package.

2.- **State waters;** covers the state waters of the NE US states. Data from [data.gov](https://catalog.data.gov/dataset/federal-and-state-waters).  

*License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.*

3.- **EEZ shapefile;** Used the *Sea Around Us* shapefle updated to June 2016.

```{r maps_sf, eval = T, echo = T, results = 'hide'}

States <- c("maine", "new hampshire", "massachusetts", "connecticut", "rhode island", "new york", "new jersey", "delaware", "maryland", "virginia", "north carolina","pennsylvania") 

# US State Map (land)

land_sf <- st_as_sf(map("state", plot = FALSE, fill = TRUE)) %>% 
  filter(ID %in% States) 

# ggplot(us_map) +
  # geom_sf()

# US EEZ map

# path_world <- MyFunctions::my_path("G", extra_path = "Spatial/SAU/SAU_Shapefile")
path_world <- "~/Downloads/SAU_Shapefile"

# The File
fnam_world <- "SAUEEZ_July2015.shp"

# Load it!
eez_sf <- st_read(dsn = path_world,
                        layer =file_path_sans_ext(fnam_world)) %>% 
  rename(eez_name = Name) %>% 
  st_transform(crs = 4326) %>% # 4326
  st_simplify(preserveTopology = TRUE, dTolerance = 10000) %>% #0.1 for paper
  filter(eez_name == "USA (East Coast)")


# ggplot(eez_sf) +
# geom_sf(aes(),fill =NA)

# ggplot() +
#   geom_sf(data = World_eez_sf, aes(), color = "red") +
#   geom_sf(data = us_map, aes(), color = "blue")

# US state waters
# https://catalog.data.gov/dataset/federal-and-state-waters

# us_state_w <- rgdal::readOGR(dsn = "~/Downloads/FederalAndStateWaters/FederalAndStateWaters.gdb")
state_sf <-  st_read("~/Downloads/FederalAndStateWaters/FederalAndStateWaters.gdb") %>%
  janitor::clean_names() %>% 
  mutate(jurisdicti = str_to_lower(jurisdicti)) %>% 
  filter(jurisdicti %in% States) %>% 
  rename(state = jurisdicti) %>% 
  # st_transform(crs = 4326)  # for matching projections
  st_simplify(preserveTopology = TRUE, dTolerance = 10000) #0.1 for paper


grid_sf <-  st_read("/Volumes/Enterprise/Data/AcrossBoundaries/Data/Spatial/grid_sf/grid_eez_sf.shp")

unique(land_sf$ID)

```


### Create base shapefile for computation

As a first step we need to divide the NE US EEZ among the different states. For that we expanded state waters up to the 200 nautical miles to then estimate the percentage that each expanded-state-waters occupied. Note that in all cases these areas overlapped and percentages accounted for it. We did this by following these steps:

- 1. Expand state waters using a buffer

- 2. Grid that buffer on a 0.3 resolution

- 3. Crop the buffer to the EEZ

#### 1. Expand Spatial regions (a.k.a  buffers)

We set a buffer of *410000* m (410 km, ~ 221 nm) that overshoots the EEZ a bit, but is eventually cropped

```{r buffer, eval = T, echo = T}

# Buffer state waters
state_bf = st_buffer(state_sf, 410000) %>% 
  st_transform(4326) %>% # to match shape
  st_set_crs(4326)

# ------------------------------ #
# Testing and visualizing the buffer 
# ------------------------------ #

state_bf_plot <- ggplot() +
  geom_sf(data = eez_sf, aes(), fill = NA) +
  geom_sf(data = state_bf, aes(color = state), fill = NA) +
  scale_color_manual(values = state_pallet)
#   

# ggsave("../Results/Partial/state_waters_buffer.png",state_bf_plot)

# ------------------------------ #

state_bf_plot

```

#### 2. Expand grid within buffer

Here we expand a grid within the buffer so we can estimate the proportion of each state.

*Note:* Dark grey shaded area is the grid

```{r grid_indexing, eval = T, echo = T, message = F, warning = F}

# Create grid of the region
bbox <- c(st_bbox(state_bf))

# Expand the grid
ne_grid <- expand.grid(
    lon = seq(from = bbox["xmin"], to = bbox["xmax"], by = 0.3),
    lat = seq(from = bbox["ymin"], to = bbox["ymax"], by = 0.3)
) %>% 
  rowid_to_column("index")

# ----------- #
# [TEST] Plot all layers
# ----------- #

# Looks good
state_sf %>%
  st_transform(4326) %>% # to match shape
  st_set_crs(4326) %>%
  st_simplify(preserveTopology = TRUE, dTolerance = 10000) %>%
  ggplot() +
  geom_sf(aes(color = state))+
  # geom_sf(data = eez_sf, aes(),fill =NA) +
  # ggplot()+
  geom_tile(data = ne_grid,
            aes(
              x = lon,
              y = lat
            ),
            alpha = 0.2
  ) +
  scale_color_manual(values = state_pallet)


```

##### 2.1 Merge grid and buffers

Once we have a gridded area, we converted the grid to a `sf` so we can merge it with the buffered states and finally filter out everything outside the states polygon

```{r grid_buf_merge, eval = T, echo = T, message = F, warning = F, fig.width = 10}

# ---------------- #
# Convert grid to sf
# ---------------- #
grid_sf <- st_as_sf(ne_grid,
             coords = c("lon", "lat"),
             crs = 4326) %>% 
  st_join(state_bf) %>% 
  filter(!is.na(state))

# Create data frame for future computations
# Note, will be used in next chunk
grid_bf_dt <- as.data.frame(grid_sf) %>%
    select(index,state)
    # group_by(state) %>% 
    # summarise(length(index))


# ---------------- #
# [TEST] 
# Visualization of grid
# ---------------- #

gridExtra::grid.arrange(
  # Overall (overlapping) position
  ggplot(grid_sf) +
    geom_sf(aes(color = state), alpha = 0.3) +
    geom_sf(data = eez_sf, aes(),fill =NA) + 
    scale_color_manual(values = state_pallet) +
    theme(legend.position = ""),
  # Showing each state separatley
  ggplot() +
    geom_sf(data = grid_sf, aes(color = state),size = 0.1, alpha = 0.5) +
    facet_wrap(~state) +
    theme(legend.position = "top") +
    scale_color_manual(values = state_pallet),
  nrow = 1)

```

#### 3. Crop buffers to EEZ

Finally, we crop the grided buffers to within the EEZ to capture the actual water space.

*Note:* This step takes quite a while because of the size of the EEZ shapefile. No, you cannot use `st_simplify()` here

```{r buff_to_eez, eval =T, echo = T, message = F, warning = F}


# I don't know how to undo st_simplify so need to re-load the shapefile
eez_sf <- st_read(dsn = path_world,
                        layer =file_path_sans_ext(fnam_world)) %>% 
  rename(eez_name = Name) %>% 
  st_transform(crs = 4326) %>% # 4326
  filter(eez_name == "USA (East Coast)")

# Get the overlapping segments
grid_eez_sf <- st_intersection(grid_sf,eez_sf)

# write_sf(grid_eez_sf, paste0(my_path("D", "Spatial"),"grid_eez_sf.shp"))

# Get final df with index
grid_eez_df <- as.data.frame(grid_eez_sf) %>%
  select(state,index) %>% 
  left_join(ne_grid, 
            by = "index")

# write_csv(grid_eez_df, paste0(my_path("R", "Partial"),"grid_eez_df.csv"))

# Plot to make sure makes sense (Picasso syle)
grid_eez_sf %>%
  st_simplify(preserveTopology = TRUE, dTolerance = 10000) %>% #0.1 for paper
  ggplot() +
  geom_sf(aes(color = state), alpha = 0.3)

```

### Interpolation rutine

In this step we [interpolate](https://swilke-geoscience.net/post/spatial_interpolation/) the survey data within the previously created grid following a Triangular Irregular Surface method.

- We removed cases where `wtcpue < 0`

#### Functions needed

We need to create a couple of functions to run the whole process

##### Interpolation main fx (tis).

This is the main function used to interpolate the data per year. It follows a Triangular Irregular Surface method using the `interp::interp()` function. If you want to see the function clic on `code`

```{r interpol_function, eval = T, echo = T}

tis <- function(input_data, grid, yr, taxa, reg){
  
  # --------------- #
  # Testing
  # print(paste(yr))
  # yr = 1976
  # --------------- #
  
  # Filter data
  data <- input_data %>% 
    filter(year == yr,
           spp == taxa,
           region == reg
    ) %>% 
    # Average duplicated hauls in the same spot
    group_by(region,year,spp,lat,lon) %>%
    summarise_at(vars(wtcpue),
                 mean,na.rm = T) %>%
    filter(wtcpue > 0)
    
  # Only interpolate cases where there is more than 3 rows
  # Marked by the function 
    if(nrow(data) <= 3){
      fit_tin <- tibble()
    }else{
      
      # Triangular Irregular Surface
      fit_tin <- interp::interp( # using {interp}
        x = data$lon,           # the function actually accepts coordinate vectors
        y = data$lat,
        z = data$wtcpue,
        xo = grid$lon,     # here we already define the target grid
        yo = grid$lat,
        output = "points"
      ) %>% 
        bind_cols() %>% 
        bind_cols(grid) %>%
        mutate(year = yr,
               region = reg,
               spp = taxa) %>% 
        select(index, state, year, region, spp, lon=x, lat=y, value = z)
      
    }
   
    return(fit_tin)
}

# Test me
# Needs variables in Control panel
# Test no data: "Alosa aestivalis", reg = "Northeast US Fall", yr = 1974 
# tis(input_data = ocean_data, grid = grid_eez_df, taxa = "Illex illecebrosus", reg = "Northeast US Fall", yr = 1973)



# lapply(years,tis,input_data = ocean_data, grid = grid_eez_df, taxa = "Illex illecebrosus", reg = regions[2])

```

##### Run function

This is a sub-function that runs the `tis()` function by taxa and region. It saves the output as a .csv file for each species. 

```{r interpol_run_fun, eval = T, echo = T}


run_tis <- function(input_data, grid, years, taxa, reg){
  
  
  # Run tis for species and surveys
  for(r in 1:2){
    
    partial_df <- bind_rows(
      lapply(years,tis,input_data = ocean_data, grid = grid_eez_df, taxa = taxa, reg = regions[r])
    )
    
    if(r == 1){
      historic_tif <- partial_df
    }else{
      historic_tif <- bind_rows(historic_tif,partial_df)
    }
    
  }
  
  # ----------------------- #
  # Save dataset per species
  # ----------------------- #
  
  # Set file name
  name <- paste0("tif_",str_replace(taxa," ","_"),".csv")
  
  # Set path name
  save_path <- my_path("R","Partial/Interpolation")
  
  # Set complete path
  save_name <- paste0(save_path,name)
  
  # Create folder if it does not exist
  if(file.exists(save_path) == F){
    dir.create(save_path)
  }
  
  #  Save file
  write_csv(historic_tif,save_name)
  
  return_msg <- paste("Interpolation done for", taxa)
  
  return(return_msg)
  
  
}

# Test me
    # run_tis(input_data = ocean_data, grid = grid_eez_df, taxa = "Centropristis striata", years = years, reg = regions)

```


#### Survey data

The interpolation was done with NOAA Northeast Fisheries Science Center Spring and Fall Bottom Trawl Surveys [data](https://www.fisheries.noaa.gov/region/new-england-mid-atlantic#science) provided by [Ocean adapt](https://oceanadapt.rutgers.edu/). Data was accessed trough the [Github](https://github.com/pinskylab/OceanAdapt).

*In primary publications using data from the database, please cite Pinsky et al. 2013. Marine taxa track local climate velocities. Science 341: 1239-1242 doi: 10.1126/science.1239352, as well as the original data sources.*

- Using only the Northeast US Fall and Spring bottom trawl survey data for now

##### Splitting up data

- No part on spatial analysis. Can be ignored. 

This is just a sub-step to split up the data into single species files. This makes the app faster as it only needs to load species specific data, rather than all the data at de begining. 

```{r dat_exploded, eval = T, echo = F, fig.width = 9}

ocean_data <- readRDS("/Volumes/Enterprise/Data/pinskylab-OceanAdapt-966adf0/data_clean/dat_exploded.rds") #%>% 
  # filter(spp == "Centropristis striata",
       # region %in% c("Northeast US Fall" , "Northeast US Spring")) #No more seasons


spp_data <- function(spp){
  
  name_save <- paste0(my_path("D","Spp/Observation"),"obs_",str_replace(spp[1], " ", "_"),".csv")

ocean_data %>% 
  filter(spp == spp,
       region %in% c("Northeast US Fall" , "Northeast US Spring")
       ) %>% 
  write_csv(., name_save)
  
}

spp_list <- ocean_data %>% 
  filter(region %in% regions,
         spp != "NA") %>% 
  pull(spp) %>% 
  unique()
  

lapply(spp_list, spp_data)


subset_data <- ocean_data %>% 
  filter(spp == "Centropristis striata",
       region %in% c("Northeast US Fall" , "Northeast US Spring")
       )

ggplot() +
  geom_point(data = subset(subset_data, wtcpue = 0.0),
             aes(
               x = lon,
               y = lat
             ),
             color = "grey95"
  ) +
  geom_point(data = subset(subset_data, wtcpue > 0),
             aes(
               x = lon,
               y = lat,
               color = log10(wtcpue)
             ),
             size = 1
  ) +
  scale_color_distiller(palette = "Spectral", 
                        guide_legend(title = "WCPUE per Haul (log10)")) + 
  coord_sf(xlim = c(-76, -65),ylim = c(35, 45)) +
  MyFunctions::my_ggtheme_m() +
  facet_wrap(~region)

```

#### Control Pannel

This is where we load the required data and prepare to run the interpolation function. Note that some of the data here has been previously created 

```{r contro_pannel, eval = T, echo = T}

# Load grid df
grid_eez_df <- my_path("D","Spatial","grid_eez_df.csv", read = T)

# Run interpolation for all years
years <- seq(1973,2019,1)

# regions
regions <- c("Northeast US Fall" , "Northeast US Spring")

# species list
spp <- ocean_data %>% 
  filter(region %in% regions,
         spp != "NA") %>% 
  pull(spp) %>% 
  unique()


# Double check runs

spp_runs <- tibble(taxa = (list.files(my_path("R","Partial/Interpolation")))) %>%
  mutate(
    taxa = str_remove(taxa, "tif_"),
    taxa = str_remove(taxa, ".csv"),
    taxa = str_replace(taxa, "_", " ")
  ) 

# Missing runs
spp <- tibble(taxa=spp) %>% 
  anti_join(spp_runs) %>% 
  pull(taxa)


```

#### Run routine

Here we just run the routine for each of the species present in the Northeast US Fall and Spring surveys between 1973 and 2019. 

- Note there are 43 identified taxa
- Some taxa do not have presence data in some years

```{r run_routine, eval = T, echo = T}

# single species run
# run_tis(input_data = ocean_data,
#         grid = grid_eez_df,
#         years = years,
#         reg = regions,
#         taxa = "Illex illecebrosus"
# )


# Run them all in parallel
parallel::mclapply(spp,
                   run_tis, 
                   input_data = ocean_data,
                   grid = grid_eez_df,
                   years = years,
                   reg = regions
)


```

# Results

This section shows preliminary results for Black sea bass (*Centropristis striata*).

```{r load_data, eval = T, echo = T}

# Load interpolated data
# "Centropristis striata"
# 
historic_tif <- my_path("R","Partial/Interpolation","tif_Centropristis_striata.csv", read = T)

# Spatial data

States <- c("maine", "new hampshire", "massachusetts", "connecticut", "rhode island", "new york", "new jersey", "delaware", "maryland", "virginia", "north carolina","pennsylvania") 

# US State Map (land)

land_sf <- st_as_sf(map("state", plot = FALSE, fill = TRUE)) %>% 
  filter(ID %in% States) 


# Periods
periods <-tibble(
  period = c(rep("a early",12),
             rep("b mid",12),
             rep("c late",12),
             rep("d now",12)
             ),
  year = c(seq(1972,1984,1),
            seq(1985,1997,1),
            seq(1998,2011,1),
            seq(2012,2019,1)
            )
  )

# State pallet

state_pallet <- c(wes_palette(n = 5, name = "Darjeeling1"),
                               wes_palette(n = 5, name = "Darjeeling2"),
                               wes_palette(n = 3, name = "Royal1")
                  )



# print for notebook
head(historic_tif)
```


## Average proportion

This map shows the average proportion of the extrapolated value from the whole study period within each State's water.

*Note:* Grey tiles represent `NAs`

```{r area_map_state, eval = T, echo = T, fig.width=10, fig.height=12}

 data_grid <- historic_tif %>% 
  # left_join(periods) %>% 
  # group_by(state,lat,lon,period) %>% 
  group_by(state,lat,lon,region) %>% 
  summarise(mean = mean(value,na.rm= T), .groups = "drop") %>% 
  rename(ID = state)

ggplot(data_grid) +
  geom_tile(
    aes(
      x = lon,
      y = lat,
      fill = mean,
      col = mean
    )
  ) +
  facet_wrap(~ ID  +  region,
             ncol = 4) +
  scale_fill_viridis("Mean Proportion", alpha = 0.8) +
  scale_color_viridis("Mean Proportion", alpha = 0.8)
```

## Proportion Change

Here we show the average proportion of the interpolation by State and time period. Time periods were arbitrary defined as;

- Early; 1972 to 1984
- Mid; 1985 to 1997
- Late; 1998 to 2011
- Now; 2012 to 2019

*Notes:* To panel represents Fall survey and bottom panel Spring. This computation considers the Overlapping of state waters.

```{r area_map, eval = T, echo = T, fig.width = 10}

total_fited <- historic_tif %>% 
  group_by(year,region) %>% 
  summarise(total_value = sum(value,na.rm=T),.groups = "drop")

state_fit <- historic_tif %>% 
  group_by(state,year,region) %>% 
  summarise(state_value = sum(value,na.rm= T), .groups = "drop") %>% 
  left_join(total_fited,
            by = c("year","region")) %>%
  mutate(percentage = state_value/total_value*100) %>% 
  left_join(periods,
            by = "year") %>% 
  group_by(ID = state,period,region) %>% 
  summarise(mean_per = round(mean(percentage)),.groups = "drop")
  
# check percentages
# state_fit %>% 
#   group_by(period) %>% 
#   summarise(sum(mean_per))


land_sf %>% 
  left_join(state_fit,
            by = "ID") %>% 
  ggplot() +
  geom_sf(aes(fill = mean_per)) +
  viridis::scale_fill_viridis("Mean Proportion", alpha = 0.8) +
  facet_wrap(~region + period, nrow = 2) +
  my_ggtheme_m()

```

## Area plot

This graph shows the proportion of the interpolation value each State has over time.

*Note:* This plot is interactive. For ease comparison between States you can;

- *One* click on any State to *remove* it from the plot 
- *Two* clicks on any State to *isolate it* from the plot (other states can then be added by just clicking on them).
- The bottom panel allows you to reduce the time frame

```{r area_plot, eval = T, echo = T, fig.height = 10, fig.width = 10}

total_fited <- historic_tif %>% 
  group_by(year,region) %>% 
  summarise(total_value = sum(value,na.rm=T),.groups = "drop")

# group by state
state_fit <- historic_tif %>% 
  group_by(state,year,region) %>% 
  summarise(state_value = sum(value,na.rm= T), .groups = "drop") %>% 
  left_join(total_fited,
            by = c("year","region")) %>%
  mutate(percentage = state_value/total_value*100)

p <- ggplot(state_fit) +
  geom_area(
    aes(
      x = year,
      y = round(percentage),
      fill = state
    )
  ) +
  ylab("Percentage (%)") +
  # viridis::scale_fill_viridis(discrete = T, alpha = 0.8) +
  scale_fill_manual(values = state_pallet) +
  MyFunctions::my_ggtheme_p() +
  facet_wrap(~region, ncol = 1)

ggplotly(p,
         dynamicTicks = TRUE) %>% 
  layout(hovermode = "x") %>%
  rangeslider()

```

## Area plot (running mean)

This graph shows the proportion of each State smoothed over a *10 years running mean**. It allows to better see increasing and decreasing trends.

*Note:* This plot is also interactive and thus, follows the same options of the previous one.

```{r area_lm_plot, eval = T, echo = T, fig.height = 10, fig.width = 10}

# group by state
state_fit <- historic_tif %>% 
  group_by(state,year,region,.groups = "drop") %>% 
  summarise(state_value = sum(value,na.rm= T), .groups = "drop") %>% 
  left_join(total_fited,
            by = c("year","region")) %>%
  mutate(percentage = state_value/total_value*100) %>% 
  group_by(state,region) %>% 
  mutate(RMean = zoo::rollmean(x = percentage, 
                            10,
                            fill = NA,
                            na.rm=T)
    )

# Plot me
p <- ggplot(state_fit) +
  geom_area(
    aes(
      x = year,
      y = round(RMean),
      fill = state
    )
  ) +
  ylab("10 yrs running average (%)") +
  # viridis::scale_fill_viridis(discrete = T, alpha = 0.8) +
  scale_fill_manual(values = state_pallet) +
  MyFunctions::my_ggtheme_p() +
  facet_wrap(~region, ncol = 1)

suppressWarnings(
ggplotly(p,
         dynamicTicks = TRUE) %>% 
  layout(hovermode = "x") %>% 
  # add_trace() %>% 
  rangeslider()
)

```
